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ABSTRACT 



Did the Third International Mathematics and Science Study 



(TIMSS) ask eighth grade science teachers the right questions about their use 
of instructional time? TIMSS asked teachers to recall a lesson that they had 
taught, and then group activities into 11 categories. This study examined the 
TIMSS question "How did the lesson proceed?" by videotaping six classes of 
eighth grade science in Alabama and Virginia and comparing observer coding of 
the video to the teachers' recalled descriptions of the same class. The 
manner in which the TIMSS data were collected and the manner in which data 
were collected from teachers in this videotape study suggested the use of a 
repeated measures analysis of variance (ANOVA) model . Using a repeated 
measures analysis allowed us to look at the interactions between teachers and 
observer and the 11 TIMSS activities, the 26 NSF student activities, and the 
11 NSF teacher activities, The difference between observer and teacher 
responses using TIMSS categories was not significant; however, 43% of the 
total variance was explained by whether the teacher or the observer reported 
the times for the instructional activities. The teachers also responded to 
questions from the NSF Local Systemic Change Through Teacher Enhancement K-8 
Teacher Questionnaire to describe the same class. The difference found 
between the teacher and the observer coding was not significant, but the 
amount of variance explained by the data source (observer or teacher) dropped 
to 33% when using NSF student activity categories and to 26% when using NSF 
teacher activity categories . The study concluded that questionnaires to 
survey science teachers about their instructional activities should include 
operational definitions, methods of classifying single activities into 
multiple instructional categories, and questions that are more accurate in 
describing quality science instructional activities. (Contains 43 
references . ) (Author/SAH) 
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Eighth Grade Science Teachers Use of Instructional Time: 

Comparing Questions from the Third International Mathematics and Science Study (TIMSS) 
and National Science Foundation Questionnaires 

Abstract 

Did the Third International Mathematics and Science Study (TIMSS) ask eighth grade science 
teachers the right questions about their use of instructional time? TIMSS asked teachers to recall a lesson 
that they had taught, and then group activities into 1 1 categories. This study examined the TIMSS 
question “How did the lesson proceed?” by videotaping six classes of eighth grade science in Alabama 
and Virginia and comparing observer coding of the video to the teachers’ recalled descriptions of the 
same class. 

The manner in which the TIMSS data were collected and manner in which data were collected 
from teachers in this video tape study suggested the use of a repeated measures analysis of variance 
(ANOVA) model. Using a repeated measures analysis allowed us to look at the interactions between 
teachers and observer and the 1 1 TIMSS activities, the 26 NSF student activities, and the 1 1 NSF teacher 
activities. 

The difference between observer and teacher responses using TIMSS categories was not 
significant; however, 43% of the total variance was explained by whether the teacher or the observer 
reported the times for the instructional activities. The teachers also responded to questions from the NSF 
Local Systemic Change Through Teacher Enhancement K-8 Teacher Questionnaire to describe the same 
class. The difference found between the teacher and the observer coding was not significant, but the 
amount of variance explained by the data source (observer or teacher) dropped to 33% when using NSF 
student activity categories and to 26% when using NSF teacher activity categories. 

The study concluded that questionnaires to survey science teachers about their instructional 
activities should include operational definitions, methods of classifying single activities into multiple 
instructional categories, and questions that are more accurate in describing quality science instructional 
activities. 
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Eighth Grade Science Teachers Use of Instructional Time: 

Comparing Questions from the Third International Mathematics and Science Study (TIMSS) 
and National Science Foundation Questionnaires 

Introduction 

Teaching science effectively is a complex process. Identifying and defining methods to 
accurately record and quantify the many interconnecting aspects of effective science teaching 
would be useful for many purposes, including international comparison of science teaching and 
improvement of science instruction. 

How can we measure quality science instruction with written instruments or 
observations? What questions should we ask to determine if a science teacher is really using 
techniques and strategies that follow goals described by the current science education reform 
movement? To have an effective instrument to measure quality of science instruction would 
allow large numbers of teachers to be surveyed with reasonable expenditures of time and effort. 
Educators and policy makers could then use such information to thoughtfully improve the quality 
of science education. However, to make such important decisions using an instrument that does 
not accurately report teaching practices is most certainly counterproductive. 

Since the release of the Third International Mathematics and Science Study (TIMSS) 
reports in 1996, many educators, politicians, business leaders, and parents have 
used the reports to make sweeping statements about how U.S. science and math education should 
be changed. Schools cannot easily change culture or students' home environment. However, 
there is some control over the direction of teacher and school change. The development of 
instruments that enable teachers to self-report instructional activities accurately is also a 
necessary step to track the adoption of science education reform ideas into the classroom. 

The purpose of this study is to analyze the TIMSS question “How did the lesson 
proceed?” (IEA, 1995b). Six classes of eighth grade science in Alabama and Virginia were 
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(Appendix A) and sections 



videotaped. After the classes were taped, the teachers were asked to complete a quesnonnaue 
that was comprised of sections from the TIMSS Popnlation 2 Science Teacher Questionnatre 

from the MSP T oral Syste m '- Ch-noe through Teacher Enjancement 

,ooo Who, ^ea ionaate ^-8 Science instrument (Horizon Research, Inc., 1998). Their 
answers were compared to quantitative coding of their videotape to determine the difference in 
the real and perceived duration and occurrence of instructional activities and which of the 

questionnaires most accurately represented classroom practices. 

The purpose of this study was to extend the analysis of science teacher data by focusing 

on these two questions: 

1. Is there a significant difference in the real and perceived occurrence and duration of 
instructional activities of U.S. eighth grade science teachers as measured by the 
TIMSS questionnaire? 

2. Is the use of instructional time in U.S. eighth grade science classrooms more 
accurately reflected by teachers' responses to the TIMSS questionnaire or the 

Qcience Foun d -™ 1NSF, Local Systemic Change through Tea chst 
1 998 Te--t~~ rwstinnnaire K-8 Science (Horizon Research Inc, 



1998)? 



Review of the Literature 



Effective l ienee Teaching 

A review of the literature indicates a concurrence on the most productive methods and 
strategies to use in teaching science, as indicated in Table 1 (Brumning, Schraw, & Ronning, 
1995; Tobin, Tippins, & Gallard, 1994; Woloshyn, ,995; and National Research Council, 1996). 
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In general, the trend seems to be a move toward more student-centered activities and away from 
teacher-centered activities (Sealey, 1985). 

Hoffstein and Walberg (1995) found that instructional strategies can be located on a 
continuum. One end is teacher-centered, where the teacher is active and the student is less active, 
but not intellectually passive. The other end of the continuum is student-centered, where the 
student is more active in the learning process. They categorized strategies such as laboratory 
activities, inquiry techniques, small-group discussion, individualized learning, computer 
simulations, and field trips as student-centered. Lectures, classroom discussions, demonstrations, 
and questioning techniques were described as more teacher-centered. 

Constructivism is the proposition that we construct our own understanding of the world 
through reflection upon our interactions with objects and ideas. We synthesize new experiences 
and information into what we have previously come to understand. Learning takes place by the 
examination of new information and comparisons to our mental models. We then accept, reject, 
or modify the new information (Brooks & Brooks, 1993; Stein et al., 1994). Hendry and King 
(1994) pointed out the ineffectiveness of attempting to teach science by the transmission mode. 

A teacher cannot expect all students to learn the same thing at the same time given the same 
classroom experiences due to differences of experiences that individuals bring to the classroom. 

Naive science is defined as the preconceptions that students have concerning scientific 
phenomena. Na'ive science can be very difficult to alter with instruction unless appropriate 
learning experiences take place. Students' misconceptions tended to be very powerful, in some 
areas even negating direct evidence they observed in experimental and classroom settings 
(McCloskey, Caramazza, & Green, 1 980). 
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Table 1 



Effective Science Teaching Methods 



More Effective 


Less Effective 


Constructivism 


Transmission 


Inquiry or problem solving 


Lecture/Note taking/Test 


Teacher as facilitator and role model 


Teacher as source of knowledge 


Shift to internal locus 


External locus 


Opportunities to learn independently 


Whole-class learning 


Freedom of interaction with materials 


Cookbook labs 


Student-structured learning strategies 


Teacher-structured learning 


Problem-solving laboratories 


Technical skills or verification labs 


Hands-on 


Teacher demonstrations 


Wait-time 


Competitive answering 


Reflective thinking 


Lower cognitive level thinking 


Cooperative learning 


Competitive, individual learning 


Performance and portfolio assessment 


Test-driven learning 


Instructional experiences outside the classroom 


Field trips 


Increased use of technology 


Limited technology 


Students actively engaged 


Students as passive learners 



Note : Compiled from Brunning et al., 1995; Tobin et al., 1994; Woloshyn, 1995; and 
National Research Council, 1996 
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Wait Time. Rowe (1969) examined audiotapes of exceptional science lessons in which 
students were demonstrating high levels of inquiry. She noted the value of using an average wait 
time of 3 to 5 seconds after asking a question. Extending wait time to 3 to 5 seconds has 
subsequently been studied in science classes and has consistently shown to improve science 
comprehension and achievement (Rowe, 1974, 1991). In a meta-analysis concerning the effect of 
instructional techniques on science achievement, wait time was found to be the most powerful 
technique employed by teachers to increase students' cognitive outcomes, critical thinking, 
creative thinking, and positive attitude (Wise & Okey, 1983). 

Questioning Techniques . Swift and Gooding (1983) found that one result of increased 
wait time was that teachers more frequently used evaluative questions. Tobin (1984) found that 
most of the changes resulting from increasing wait time were in teacher behavior. For example, 
when teachers asked more probing questions, the students had to react to and evaluate the 
responses of other students, and achievement increased.Use of Bloom's Taxonomy to guide 
questions in the science classroom has been suggested by Gilbert (1992). Analysis, synthesis, 
and evaluation have been labeled as higher order thinking skills or questions (Bloom, Engelhart, 
Furst, Hill, & Krathwold, 1956). The use of higher order questions can facilitate the inquiry 
method, demonstrate how the students have constructed their learning, and facilitate making the 
science classroom more student-centered. 

Inquiry . Brunning et al. (1995) concluded that science must be learned through a 
problem-solving process, including the development and testing of hypotheses. The science 
curriculum restructuring of the 1960s and 1970s, strongly associated with inquiry methods was 
examined in a meta-analysis of 81 studies involving over 40,000 K-12 students (Shymansky, 
Hedges, & Woodworth, 1990; and Shymansky, Kyle, & Alport, 1983). The researchers 
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concluded that this curriculum, as opposed to the traditional textbook based programs, enhanced 
students' science achievement and process skills, and showed a positive effect on students’ 
attitudes about science. 

The National Science Education Standards (NSES, National Research Council, 

1996) has identified a number of teaching models that strongly endorse inquiry as the focus of 
science instruction. Scientific inquiry has also been identified as a primary method of effective 
science instruction for use in science education reform by the National Science Teachers 
Association (NSTA) and the American Association for the Advancement of Science (AAAS, 
1993; and Aldridge, 1995). 

Cooperative Learning is one of the most extensively researched and widely accepted 
methods to sturcture science classrooms so that meaningful inquiry can take place. (Johnson, 
Johnson, & Holubec, 1993). This teaching/leaming strategy is encouraged in the National 
Science Education Standards as a successful strategy for student laboratory investigations 
(National Research Council, 1996). A variety of formal and informal cooperative strategies have 
been shown to be effective in improving student achievement and attitudes toward school, 
increasing the frequency of students sharing ideas, and improving student relations (Kagan, 1985; 
Lazarowitz, Hertz-Lazarowitz , and Baird, 1994; Scheurman, 1998; Slavin, 1991; Slavin 1987). 

Multiple Intelligences . Howard Gardner, after studying brain-damaged patients, normal 
children, prodigies, idiot savants, autistic children, and children with learning disabilities, 
proposed the theory of Multiple Intelligences (MI) in 1983. Gardner theorized that humans have 
these seven intelligences: linguistic, logical- mathematical, spatial, bodily-kinesthetic, musical, 
interpersonal, intra-personal. Gardner later added the eighth intelligence, naturalistic (Checkley, 

1997) . 
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Educators such as Thomas Armstrong (1994) extended the MI theory into the classroom. 
Learning styles are the manifestations of these eight intelligences operating in natural learning 
contexts. Each individual potentially has all intelligences, but one or more will be more highly 
developed than the others. Armstrong categorized the typical science teaching techniques as 
addressing the Logical-Mathematical Intelligence through scientific demonstrations, logical 
problem-solving exercises, calculations, scientific thinking, and logical-sequential presentation 
of subject matter. To allow each student the opportunity to involve their most highly developed 
intelligence in learning across the disciplines, he suggests that instructors use a variety of 
teaching styles from lesson to lesson. 

Third International Mathematics and Science Study (TIMSSl 

TIMSS was a project sponsored by the International Association for the Evaluation of 
Educational Achievement (IEA). TIMSS was the largest, most comprehensive, and most 
rigorous international comparison of education ever conducted, in any discipline of study. The 
TIMSS science achievement test was developed by an international group of National Research 
Coordinators to cover appropriate content, performance expectations, and perspectives. The 
TIMSS Subject Matter Advisory Committee, with representatives from 10 countries, ensured 
that the test reflected current thinking and priorities in the sciences. The instruments were tested 
to insure that there was no bias against any country (Beaton et al., 1996). 

The United States invested approximately $30 million to participate in TIMSS in 1995- 
1996. TIMSS was repeated with some extensions and modifications in 1998-1999 as TIMSS-R, 
using only the Population 2 (middle school) level. The United States will invest an additional 
$30 million in TIMSS-R, which will focus on the eighth grade level (Orland, 1998). 
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During the 1995 school year, TIMSS tested the math and science knowledge of a half- 
million students from 41 nations at five different grade levels: 

1. Population 1 in the TIMSS includes those students enrolled in the pair of adjacent 
grades that contained the most nine-year-olds. (Grades 3 and 4 in the U.S. and most of the world. 
Grades 2 and 3 in a few nations.) 

2. Population 2 in the TIMSS includes those students in the pair of adjacent grades that 
contained the most 13-year-olds at the time of testing. (Grades 7 and 8 in the U.S. and most of 
the world. Grades 6 and 7 in a few nations.) 

3. Population 3 in the TIMSS includes those students in their final year of secondary 
school, whatever their age. (Grade 12 in the U.S. and most nations. Grades 9-13 in some 
nations.) (U.S. Department of Education, 1996) 

A three-stage selection process was used to identify the sample in each of the 41 TIMSS 
countries: (a) schools with targeted ages of students were identified and divided 
into categories, if needed, to produce a representative sample, (b) classes were selected from 
each school, and (c) sub-samples of students were selected from classes, if necessary. Teachers 
sampled were those whose classes were selected in this process (U.S. Department of Education, 
1996). 

The teachers of students tested at the Population 1 and 2 levels were also surveyed using 
questionnaires, interviews, and videotapes. In addition to tests and questionnaires, TIMSS 
included curriculum analysis, videotaped observations of mathematics classrooms, and case 
studies of policy issues. This approach was intended to not only compare student achievement, 
but also provide insight into how life in U.S. schools is different from that in other nations. A 
sample of Population 2 mathematics classes in Germany, Japan, and the United States was 
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videotaped and coded quantitatively. However, the initial report contained less information about 
science than mathematics classes due to the lack of full analyses of science teacher questionnaire 
data and the absence of science classes videotape data 

From analysis of TIMSS, U.S. students were above the international average at the fourth 
grade level in both mathematics and science. U.S. students were outperformed in science by 
students only in Korea (U.S. Department of Education, 1997). At the eighth grade level, U.S. 
students performed slightly above the international average in science, but slightly below the 
international average in mathematics (U.S. Department of Education, 1996). U.S. students in the 
twelfth grade scored below the international average and among the lowest of the 21 TIMSS 
nations participating in both mathematics and science general knowledge in the final year of 
secondary school (U.S. Department of Education, 1998). 

Comparison of Curricula. Valverde and Schmidt (1998) emphasized the concern that, 
although U.S. students do not start out behind students in other TIMSS countries in science and 
math achievement, they fall behind in the middle grades. Preliminary results indicate that the 
curriculum and instruction are critical in this decline. U.S. teachers cover more topics per grade 
than is common in most TIMSS countries. According to Valverde and Schmidt, "The unfocused 
curriculum" and "instructional practices [that] mirror the incoherent presentation of mathematics 
that characterizes our intended curriculum" contribute to lower scores in the Population 2 U.S. 
students. 

The final TIMSS video sample included 231 classrooms of Population 2 mathematics 
students: 100 in Germany, 50 in Japan, and 81 in the United States. One lesson was videotaped 
in each classroom (Stigler and Herbert, 1 997). The goals of this study were to learn how eighth- 
grade math is taught in the United States, Germany, and Japan, and to learn how U.S. teachers 
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view reform and to determine whether they were implementing teaching reforms in their 
classrooms. Standard videotaping procedures were developed. The teachers then filled out a 
questionnaire to give more descriptive data about the lesson and math education reform methods 
used (Stigler & Hiebert, 1997). 

The preliminary results indicate differences in instructional practices in four areas: 

1 . Solving problems is the primary goal for U.S. and German teachers. Understanding 
mathematics is the main goal for Japanese teachers. U.S. and German lessons have two phases: 
acquisition and application. In Japan, students engage in problem solving, and then reflect on 
those solutions to increase their understanding . 

2. The average U.S. mathematics lesson was found to be on the seventh-grade level by 
international standards; the average German lesson, at the eighth grade level; and the Japanese 
lesson, at the ninth grade international level. 

3. In the U.S. and Germany, more than 90 percent of the time is spent in practicing 
routine problems. In Japan, students spent the majority of their time inventing new solutions that 
require conceptual thinking about mathematics. 

4. Seventy percent of the U.S. teachers claim to be implementing the “reform” outlined 
by the National Council of Mathematics Teachers (1989). However, the tapes indicated only 
surface features of reform, such as the use of manipulatives or cooperative groups. Japanese 
classes were seen to implement more of the reform concepts, such as inclusion of high-level 
mathematics, a clear focus on thinking and problem solving, and an emphasis on students 
deriving alternative solution methods and explaining their thinking. However, Japanese lessons 
include more lecturing and demonstration than the U.S. lessons. The use of calculators was never 
observed in Japanese classes (U.S. Department of Education, NCES, 1999). 
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Analysis of Population 2 Mathematics Teacher Data . Williams and Jocelyn (1998) 
described the TIMSS reports of eighth grade mathematics students and teachers using the 
Rosenshine and Stevens Model of Effective Instruction. They further identified a subset of items 
considered to reflect good teaching practices embodied in the National Council of Teachers of 
Mathematics standards (Williams & Jocelyn, 1998; NCTM, 1989). The same question format 
was used for both mathematics and science teachers in the TIMSS (IEA, 1995b), even though 
Rosenshine & Stevens (1986) stipulated that their model is least applicable in subjects where 
skills do not follow explicit steps, or use a general skill that is applied repeatedly. This does not 
reflect strategies of effective science teaching, as outlined by the National Science Education 
Standards (NRC, 1996) or the review of literature. 

Methods 

Limitations of the Study 

The limitations this study included accessibility to classrooms due to administrative, 
teacher, and parental consent. All 6 teachers were volunteers, not a randomly selected sample. 
Data was also limited to 6 science classes. The reader is cautioned against generalizing this 
descriptive study to the larger population. 

The researcher was a science educator. Bias that may be the result of this situation was 
minimized by using question items from instruments developed by recognized authorities in the 
field of science education evaluation and assessment (Horizon Research Inc., 1998). Both the 
TIMSS questionnaire and the NSF Local Systemic Change through Teacher Enhancement 1998 
Teacher Questionnaire K-8 Science were designed for science teachers of this grade level. To 
minimize observer bias, the criteria for the coding system were decided prior to the actual taping. 
The tapes were coded by using the questionnaire items as categories. The researcher was also a 
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secondary school administrator who had three years of experience in classroom teacher 
observation and evaluation. 

Selection of Schools and Teachers to Participate 

Two school systems, one in Alabama and one in Virginia, were selected to participate in 
the video study. These two systems were selected to vary the geographical areas, as well as 
school system organizational structures for the project. The Virginia school system was 
organized on the principle of site-based management. The Alabama school system was organized 
at the central school district level. Both systems cover a range of communities from suburban to 
rural. Permission was given by both systems. 

Confidentiality of the specific schools and teachers was maintained through the use of 
fictitious names. The teachers in Virginia were designated as Mr. VA-1, Mr. VA-2, and Ms. VA- 
3. Teachers in Alabama were designated as Ms. AL-1, Ms. AL-2, and Mr. AL-3. When 
mentioned, their respective schools were given the same designations; i.e., Mr. VA-1 teaches at 
VA-1 School, Ms. AL-2 teaches at AL-2 School. The researcher had no current direct 
professional relationship with any of the principals, teachers or students used in this study. 

After gaining permission from the Virginia school system, all middle school principals in 
the approved schools were sent a letter describing the study and requesting permission to video 
one science class and survey the teacher. Each principal who agreed submitted the name of an 
eighth grade science teacher volunteer. In the Virginia school system, three principal-teacher 
teams agreed to participate. For each teacher, one period of eighth grade science was randomly 
selected for videotaping. Due to the school board regulation that all parents had to sign a 
permission form in order for the video taping to take place, the teachers were told which class 
was selected. The teachers then distributed the permission forms prior to the date selected for 



O 

ERLC 



15 



14 

taping. The few students who did not return signed permission forms were given an alternate 
learning activity in another classroom on the day of taping. 

For the Alabama data collection, all middle school principals in the school district were 
sent a letter describing the study and requesting permission to video one science class and 
interview the teacher. Three principals agreed and submitted the name of an eighth grade science 
teacher volunteer. All 3 teachers were contacted, and a class randomly selected by drawing a 
period number. Letters to parents explaining the project were sent home with each student in 
each of the selected classes. Signed permission from each was not required, but notification of 
each parent prior to taping was. 

Demographic information for the 6 teachers participating is found in Table 2. 
Demographic information for the schools of the 6 participating teachers is reported in Table 3. 

All data was collected on Mondays, Tuesdays, and Thursdays, from February 22 through 
March 25, 1999. Classes were taped as close together as possible, between the end of the first 
semester and the spring break. The research was conducted in the students' and teachers' regular 
educational setting, with minimal intrusion by the researcher. 

Selection of Items for Questionnaire 

The teacher questionnaire was divided into three parts for analysis: 

1 . TIMSS activities are the 1 1 categories from TIMSS question, “How did the lesson 
proceed?” (BTBSTM01-1 1) (See Appendix A). 

2. NSF student activities are the categories from question 13 of the NSF Local Systemic 
Change through Teacher Enhancement 1998 Teacher Questionnaire K-8 Science (Horizon 
Research, Inc., 1998). 
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3. NSF teacher activities are the categories from question 12 of the NSF Local Systemic 
Chanpe through Teacher Enhancement 1998 Teacher Questionnaire K-8 Science (Horizon 
Research, Inc., 1 998). 

These NSF questions were developed for NSF by Horizon Research, Inc. These questions 
were used as a standardized evaluation system to determine the impact of NSF's Local Systemic 
Change Initiative in more that 50 projects throughout the United States. The questions were 
based on activities related to the national standards for reforming math and science education 
(HRI, 1998). 

For this study, the response selections were modified to include time in minutes for each 
activity (originally from the TIMSS questionnaire) and response categories (originally from the 
HRI instrument for "How often does this activity occur in that particular class over the entire 
course?"). The directions to the teachers were kept as close to the original directions as possible. 
Procedure for Videotaping 

The format described for the TIMSS math video study was followed. One camera was 
used per classroom. It focused on what an "ideal" student would focus on, ordinarily the teacher 
(Stigler & Hiebert, 1 997). Students were taped only if they interacted with the teacher. The 
researcher acted as a non-participant observer. The requirement for parental permission or 
notification of parents of the prior to video taping aided in making the researcher less intrusive in 
the classroom. All of the students and teachers participating knew why the researcher was taping, 
what would be taped, and how the information would be used. Although a few students showed 
some reaction to the camera, most of the students appeared to forget that taping was going on for 
long segments of the class. The camera was set up in the back or at the side of the classrooms, so 
the researcher was away from the students’ field of vision. 
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Table 2 

Demographics of Teachers Participating in the Video Studv 






Characteristic 


Number of Teachers 


Gender 


Male 


3 


Female 


3 


Age Range 


<30 


1 


31-50 


4 


>51 


1 


Years Teaching Experience 


< 3 


1 


4-9 


3 


> 10 


2 


Certified to Teach 8 th Grade Science 


Yes 


6 


No 


0 



Before classes were randomly selected, teachers told the researcher what would be taking 
place in the classes, so that a class was not selected that would be interrupted by a school 
assembly, a period-long test, etc. After selection of the class, the general comment was, “Well, 

you can tape that class, but we’re ONLY going to be doing ” The activities of all 6 teachers 

generally followed the prior plans discussed with the researcher. Also during these 
conversations, teachers asked questions about the goals of the research and were generally 
reassured about the procedures for taping, questionnaire, and analysis. Specific questions that 
could impact a teacher’s activities during the lesson or how they answered the questionnaire, 
were not answered. Normal teaching activities and honest answers on the questionnaire were the 
goals of this process. 
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Table 3 

Demographic Information of the Schools Participating in This Study 
School VA-1 VA-2 VA-3 AL4 AL-2 AL-3 



No. Teachers 
No. Administrators 
No. Students 
% White, Non-Hispanic 
% Hispanic 
% African-American 
%Asian/Pacific Islander 
% Other Ethnicity 
% Gifted 

% Special Education 

% English as a Second 
Language 

% Reduced Lunch 

% Free Lunch 



56 


77 


80 


2 


2 


2 


757 


922 


1060 


59.6 


70.1 


44.6 


3.8 


4.6 


10.1 


32.0 


21.7 


38.1 


2.8 


2.9 


5.0 


1.8 


0.8 


2.2 


16.4 


18.2 


12.5 


12.7 


13.0 


11.7 


0 


1.7 


3.4 


7.3 


5.7 


8.6 


26.9 


12.7 


21.9 



40 


45 


46 


2 


3 


3 


672 


876 


743 


93.5 


75.9 


76.3 


0.9 


0.8 


0.5 


5.7 


19.3 


17.9 


0 


0.3 


0.1 


0 


3.7 


5.1 


2.1 


6.5 


3.2 


12.8 


4.2 


12.4 


0 


0.5 


0 


6.1 


3.5 


5.4 


16.1 


7.6 


15.9 



The one-time presence of a researcher in the classroom was likely to affect the behavior 
of a teacher and students. Prior notification, explanation, answering all questions not likely to 
affect the data collection, and location of the videographer at the back or side of the classroom 
were the steps taken to minimize this effect. 
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Coding of the Videotapes 

The tapes were viewed, and activities were coded and timed using the activities listed in 
the teacher questionnaire. Each tape was observed and coded for TIMSS Activities; then watched 
again and coded for NSF Student Activities; and, finally, watched and coded for NSF Teacher 
Activities. Coding reliability of the researcher was checked by comparison of like categories in 
the three coding systems. Teacher responses to the questionnaire were not read until after the 
researcher had coded the tapes to eliminate influence of the teacher’s perceptions. 

Analysis of the Data 

The statistical procedure chosen to analyze data in this current study was the repeated 
measures analysis of variance. The repeated measures model is a variation of the randomized 
block factorial. It uses a blocking technique to isolate nuisance variation while simultaneously 
evaluating two or more treatments and associated interactions. When repeated measures are used, 
the order of the treatment combination is randomized independently for each experimental unit 
(Kirk, 1 982). The manner of collecting the data from teachers in this videotape study suggest the 
use of a repeated measures analysis of variance (ANOVA) model (Barcikowski and Robey, 

1984). Using a repeated measures multivariate analysis allowed us to look at interactions 
between teachers and observer and the 1 1 TIMSS activities, the 26 NSF student activities, the 1 1 
NSF teacher activities. 

In this study, the teachers were the units of analyses, and the multiple questions were the 
repeated trials. The source of the data, whether teacher or observer, is the treatment, or between 
variables. Stevens (1996) identified the assumptions for a single group univariate repeated 
measures analysis as: 

1 . Independence of the observations 
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2. Multivariate normality 

3. Sphericity 

4. Homogeneity of variance/co variance matrices 

Although sphericity is not necessary for a multivariate repeated measures analysis, 
independence, normality, and equality of the variance/covariance matrices are required. A 
violation of the independence assumption is very serious. However, repeated measures analyses 
are fairly robust against violation of multivariate normality. Another advantage of choosing a 
multivariate analysis over a univariate analysis is that the multivariate is more sensitive in this 
case. It has a greater practical significance, and maintains a power of 1.000 (Stevens, 1996). 

Interpretation of the repeated measures multivariate analyses followed the format 
suggested by McLean and Ernest (1998): 

1 . Statistical significance, or the evidence that the results are more extreme than would 
happen by chance. 

2. Effect-size interpretation, or practical significance, as described by Cohen. 

(Stevens, 1996) r| 2 = .01 small; r| 2 = .06 medium; and r) 2 = .14 large. 

3. The importance of the statistics as it relates to sample size. 

4. Analyses of the graphic representation of the data. 

5. A description of the video study classes. 

Results of the use of this combinations of data analysis, rather than a simple 
analysis of variance between groups, furnished a more complete answer to the questions 
proposed by this study. Statistical significance was tested at the p = .05 level. Practical 
significance was determined by using the r) 2 value. 
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For the first research question, the comparison was between the teacher responses to the 
TIMSS questions/categories and the observed measurement of the questions/categories. Each 
teacher was a block; the reported and observed activities were the treatment combinations, used 
as the between variables. 

The second research question was analyzed using the descriptive results from a repeated 
measures analysis. Since not every lesson that each of the six teachers taught included every 
category of activities, a number of cells had no observations. This precluded the possibility of 
meeting the required inferential assumptions for the model. Thus, the repeated measures model 
was used to produce descriptive numerical data and a graphic representation of the results. The 
results were interpreted in terms of their practical significance. 

Results 

Teachers’ Report of Activities 

After reviewing the reports of the teachers, one area of interest was the multiple reporting 
of one activity in several different categories. The comparison of time reported versus the actual 
total time of the class observed is reported in Table 4. In reviewing Table 4, it is apparent that 
teachers reported single activities in multiple categories or over-reported time in five of the six 
reports using the TIMSS categories; in three of the six reports using the NSF Student categories; 
and in four of the six reports using the NSF Teacher categories. 
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Table 4 

Comparison of Teachers’ Reported Activities bv Questionnaire 



Teacher 


Total Time 

Reported 

TIMSS 

Activities 

(Minutes) 


Total Time 
Reported NSF 
Student 
Activities 
(Minutes) 


Total Time 
Reported NSF 
Teacher 
Activities 
(Minutes) 


Total Time 
Observed 
Class 
(Minutes) 


VA-1 


55* 


10 


23 


42 


VA-2 


63* 


40 


57* 


44 


VA-3 


75 


60 


70 


77 


AL-1 


72* 


201* 


75* 


54 


AL-2 


121* 


79* 


70* 


49 


AL-3 


210* 


395* 


160* 


46 



Note: * Indicates that time reported is greater than the total time of class observed 
Repeated Measures Analysis of Variance of Teachers’ Reported TIMSS Categories. 

When using the TIMSS activity categories as the coding system, the source of the data 
(whether reported by the teacher or the observer), was not statistically significant, 

F(l,5) = 3.770,/? = 0.1 10 These results are not unexpected, due to the small sample size (n = 6) 
and the number of zeroes reported in some of the categories. The SPSS software reads multiple 
zeroes as an empty cell, or no data recorded. However, the practical significance 
Ol 2 = .430) suggests that 43% of the total variance can be explained by the source of the data, 
whether reported by the teacher or the observer. That is a large effect size. The differences 
between teachers and observer data are not more extreme than could happen by chance, but 43% 
of the total variance in this sample is explained by whether the data was reported by the teacher 
or the observer. The graph of the data is shown in Figure 1 . 
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Reported v. Observed Time 

~ TIMSS Activities 

(/> 




Figure L Reported v. observed time TIMSS activities 

When using the NSF Student Activity categories, the difference in the source of the data, 
whether reported by the teacher or the observer, was not statistically significant, ^ (1,5) = 2.459, 
p = .178. The differences between teachers and observer data were not more extreme than could 
happen by chance. However, the practical significance (r| 2 = .330) suggested that 33% of the 
total variance could be explained by the source of the data, whether reported by the teacher or the 
observer. That was a large effect size. This may not apply to the general population. The graph of 
the data is shown in Figure 2. 
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Reported v. Observed Time 
^ NSF Student Activities 



a> 




Figure 2. Reported v. observed time, NSF student activities 

When using the NSF Teacher Activity categories, the source of the data (whether 
reported by the teacher or the observer), was not statistically significant, F ( 1 ,4) = 1 1 .4 1 6, p = 
.300. The differences between teachers and observer data were not more extreme than could 
happen by chance. However, the practical significance (r| 2 = .261) suggested that 26.1 % of the 
total variance could be accounted for by the source of the data, whether reported by the teacher 
or the observer. That was a large effect size. This may not apply to the general population. The 
results are shown in Figure 3. 
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Reported v. Observed Time 
% NSF Teacher Activities 




Activity 

Figure 3. Reported v. observed time NSF teacher activities. 

In comparing Figures 1, 2, and 3, the reader should note that the scales on the 
Y axes were not the same in all 3 graphs. This is a function of the SPSS 7.5 Data Editor, and not 
an intentional attempt to mislead the reader (Huf£ 1954). Comparison of 
the T ) 2 values may give a better measure of comparison. The variance explained by the 
source of the data, teacher or observer, compared to the questionnaire/coding system was: 
TIMSS =43.0% 

NSF Student Activities = 33.0% 

NSF Teacher Activities = 26.1% 
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Conclusions 

Is There a Significant Difference in the Real and Perceived Occurrence and Duration of 
Instructional Activities of U.S. Eighth Grade Science Teachers as Measured bv the TIMSS 
Questionnaire ? 

Due to the small sample size of only 6 classes, the repeated measures analysis of variance 
did not result in a statistically significant difference. However, descriptively we can see that 43% 
of the total variance is explained by whether the data is from the teacher or the observer. That is 
enough to raise questions about the validity of research based on self-reported data collection. 
Training of observers could also reduce variance in the observed/reported comparison of data. 
However, a review of the narrative report indicated specifically where teachers identified areas 
of concern with the questions: misunderstanding of the terminology and the reporting the same 
activity in multiple categories. 

The 6 teachers who participated in this study volunteered, and knew that their 
questionnaire responses would be compared to their videotape. They had an incentive for 
accurately reporting what they remembered taking place in their classroom. If this sample of 
teachers showed such variance in reporting class activities, how much more variance would be in 
the much larger TIMSS sample? 

Figure 1 indicates that the teachers generally reported more time in the instructional 
activities than did the observer, using the 1 1 TIMSS categories. Teachers particularly reported 
more time than the observer in “Drill and recitation” and “Small group activities.” This is most 
likely due to misunderstanding of the definitions of the terminology and reporting the same 
activity in multiple categories. While it could indicate that the teachers over-reported categories 
that they felt were popular, it is more likely in this study to be a result of reporting multiple 
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categories by the teachers and one category by the observer. It is interesting to note that, with 
this group of teachers, they reported less time in “Lab activities” than did the observer. The 
observer coded each activity under only one TIMSS category. Observer training, or addressing 
the issue of multiple categories prior to observations, could change the data in a study like this 
one. The same type of discrepancies could also have taken place with the TIMSS teachers in 
1995-1996. 

Research Question 2: Is the Use of Instructional Time in U.S. Eighth Grade Science Classrooms 
More Accurately Reflected bv Teachers' Responses to the TIMSS Questionnaire or the National 
Science Foundation (NSF) 1998 Teacher Questionnaire K-8 Science? 

The practical significance measures and the graphs show that there is a difference in the 
reported and observed activities in a science classroom, even on a sample as small as six classes. 
According to these results, the NSF Teacher Activities elicit responses that are the most like 
what was observed to take place in the classroom, followed by the NSF Student Activities. The 
TIMSS questions produced results most different from that of the observer. Possible reasons for 
these results include: 

1 . Teachers may be more aware of what they are doing, rather than what their students are 
doing. Therefore, a higher degree of concurrence with the NSF Teacher’s activities is not 
surprising. 

2. There were 26 student activity categories used from the NSF questionnaire; 1 1 NSF 
teacher activity categories; and 1 1 TIMSS categories. With 26 categories, there are more 
possibilities of disagreement as to which category is the most correct for a given activity. There 
is also a greater possibility that the learning activity could be described by multiple categories. 
One’s first reaction to looking at the graph of NSF student activity results in Figure 2, might be. 
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“There are too many categories!” However, the NSF student activity categories produced less 
variance in teacher versus observer responses than the TIMSS categories. 

Further Recommendations 

The quick answer to improving research on science teacher instructional practices might 
be to videotape all data collection. However, the cost is just one factor that makes video analysis 
on a wide-scale basis prohibitive. Videotaping a class alters what normally goes on in the 
classroom. Permission must first be obtained from the teachers and the parents. Teachers who are 
not confident in their performance or their employment may oppose video taping of their class. 
Parents could resist extensive video taping of their child for a number of reasons, one of which is 
privacy. Questionnaires can produce a large amount of data in a relatively short time. For these 
reasons, we should continue to improve the methodology of data collection on instructional 
practices using questionnaires. The following recommendations are based on this study. 

Develop an instrument that differentiates between effective and less effective science 
teachinp activities. The TIMSS science teacher questionnaire is not asking teachers the questions 
to best describe their science teaching practices. If students perform lab activities, is it an inquiry 
or a verification lab activity? If teachers respond that they are using small group activities, what 
type of cooperative learning structure is used in these small groups? The instructions to teachers 
in the TIMSS questionnaire indicate that completion of the questionnaire should take about one 
hour. When we ask thousands of teachers to donate an hour of their time to answer research 
questions, we should have the best questions possible. 

Develop operational definitions for science teacher questionnaires about instructional 
activities. Researchers cannot assume that all teachers agree on the definitions of pedagogical 
terms. Some terms caused confusion on all three questionnaires in this study. One teacher noted 
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that “recitation” could refer to all discussion between students and teachers, and answered the 
question accordingly. When does a “review of the previous lesson” become “drill and 
recitation”? “Simulation” was another term that obviously defined differently by the teacher and 
the researcher. The teacher and observer had different views on whether or not placing labels on 
a diagram was a “simulation.” Another term interpreted differently by teacher and observer was 
“performance assessment.” Operational definitions for each category could alleviate the problem 
of misunderstanding terminology. 

Develop procedures for classifying instructional activities into more than one category in 
teacher questionnaires. Teachers noted that some activities fell into more than one category 
simultaneously. The tapes confirmed the teachers’ conclusions. It is difficult to state which one 
category is most accurate for an activity such as building a balloon-propelled car. An argument 
was made by that teacher for the activity to fall under these NSF Student Activity categories: 

1 . Working in cooperative learning groups 

2. Working on solving a real-world problem 

3. Sharing ideas or solve problems with each other in small groups 

4. Engaged in hands-on science activities 

5. Following specific instructions in an activity or investigation 

6. Design or implement their own investigation 

7. Design objects within constraints (e. g., egg drop, toothpick bridge, aluminum boats) 

8. Work on models or simulations 

In the TIMSS categories, some activities also overlap. One example is small group 
activities and lab activities. Table 4 shows the extent to which activities overlapped using these 
instruments. The two teachers who had the students doing hands-on activities during their 
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lessons both reported multiple categories concurrently. To accurately describe what is going on 
in the science classroom using any questionnaire, some modification should be made to represent 
this occurrence. In the absence of the video to compare to the teacher’s responses, the 
questionnaire data is confusing. The total time reported in activities exceeds the total time of the 
class. For statistical analysis, the time reported for an activity could be equally divided among all 
the possible descriptive categories. However, that may not yield the best descriptive data. 
Another method should be developed for reporting instructional activities that are described by 
muhiple categories. 

Of the instruments used, the NSF students’ activities had too many overlapping 
categories; the TIMSS did not have enough descriptive categories, although some still 
overlapped. The NSF teacher categories were also seen as overlapping by the teachers in our 
study. 

Report the effect size in future TIMSS research, as well as whether or not the differences 
are statistically significant. The sample sizes in the TIMSS populations will almost automatically 
find some statistically significant differences. The practical significance reports the amount of 
variance accounted for in the total variance. This would indicate how much impact a significant 
difference has in the total system. 

Recruit more science educator involvement in the development of science teacher survey 
instruments. There is a “Special Mathematics Consultant” listed in Science Achievement in the 
Middle School Years (Beaton et al., 1996), but there is no corresponding science consultant. The 
science teacher questionnaire could better reflect the science education reform ideas currently 
being incorporated into classrooms. The NSF questions reflect these ideas much more than the 
TIMSS instrument. The TIMSS science teacher questionnaire never mentioned inquiry or 
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constructivist learning taking place in the science classroom. There is no way to differentiate 
between a “cookbook” lab and an inquiry lab activity. Other strategies that are part of effective 
science teaching, such as use of the multiple intelligences, questioning techniques, wait time, 
addressing naive science, and the impact of recent research findings about the brain, were not 
included in science teacher questions. These areas should be included in future national and 
international research. 

Include organizations other than the EEA in the discussion of international science 
education research. We should not let the current emphasis on TIMSS, some of it due to the 
considerable investment made by the U.S., distract us from closely examining other international 
educational research efforts. The Organization for Economic Cooperation and Development 
(OECD) conducted a study with a different goal at the same time, and with 13 of the TIMSS 
countries: Australia (Tasmania), Austria, Canada (British Columbia, Ontario), France, Germany, 
Ireland, Japan, the Netherlands, Norway, the United Kingdom (Scotland), Spain, Switzerland, 
and the U.S. Atkins and Black (1997) have described the general dissatisfaction with current 
science and math educational practices and a movement toward reform of those standards and 
practices among those 1 3 countries. 

Develop and test methods of data collection that minimize the difference between the 
reported and observed instructional activities. Further research should be conducted to determine 
more precisely the validity of even using such recalled, quantitative data from teachers on an 
international basis. Asking a teacher to recall a previously taught class, and then quantify the 
time spent in categories of activities produces data that is potentially very different from what 
actually took place. Since there appears to be a gap between reported and observed instructional 
activities in this small sample of teachers from the U.S., and teacher uncertainty about the 
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definitions of terms, could there also be such gaps in other countries? We have no data to 
determine if the difference is greater or smaller. We should look closely at the narrative 
description of how data was collected before interpreting its relevancy or significance. 

Expand this study to determine the reliability of an instrument for use in large-scale data 
collection. The size of the TIMSS teacher samples will result in “significantly different” results. 
We should be able to review documentation showing the accuracy of the instrument in eliciting 
accurate data from teachers. This 6 teacher sample is too small to determine if the differences 
reported can be generalized to the larger population of eighth grade science teachers. 

Pilot the use of the NSF instructional categories in an international study to determine the 
reliability of international teacher responses prior to a large-scale study. 
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APPENDIX A “How Did the Lesson Proceed?” 

[Directions to the teacher:] Think of the last <lesson> in which you taught science to your science class. 
(If this lesson was atypical, e.g., an examination period or field trip, pick the previous one.) 

14a. How did the lesson proceed? 

The following presents a list of activities that may occur during a lesson. Although the list is not 
exhaustive of what happens in a classroom, most classroom activities may be considered as variations of 
those listed below. Using this list, indicate how your lesson developed. In the blanks on the right, . . . 
estimate the amount of time you spent on each one. Ignore activities you used that do not fit into the 
descriptions listed. Write in the . . . approximate numbers of minutes for each activity . NOTE: If you did 
not do a certain activity write zero in the blank next to it. 

a. Review of previous lesson(s) 

b. A short quiz or test to review previous lesson 

c. Oral recitation or drill (students responding aloud) 

d . Time (in minutes) spent on review or correction of previous lesson's homework 

e. Introduction of a topic (class discussion, teacher explanation/demonstration, film, 
video, use of concrete materials etc.) 

f. Development of a topic (class discussion, teacher explanation/demonstration, group problem 
solving, film, video, etc.) 

g. Small group activities (with or without teacher) 

h. Students do paper-and-pencil exercises related to topic (not the same as homework) 

i. Assignment of student homework 

j. Students work on homework in class 

k. Student laboratory or data collection activity (not a separate laboratory hour) or 

hands-on session 

Note: TIMSS Questionnaire Item BTBSTMO 1-11 (IEA, 1995b). 
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